third-party library
How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow
Latendresse, Jasmine, Khatoonabadi, SayedHassan, Shihab, Emad
How Robust are LLM-Generated Library Imports? Abstract --Software libraries are central to the functionality, security, and maintainability of modern code. As developers increasingly turn to Large Language Models (LLMs) to assist with programming tasks, understanding how these models recommend libraries is essential. In this paper, we conduct an empirical study of six state-of-the-art LLMs, both proprietary and open-source, by prompting them to solve real-world Python problems sourced from Stack Overflow. We analyze the types of libraries they import, the characteristics of those libraries, and the extent to which the recommendations are usable out of the box. Our results show that LLMs predominantly favour third-party libraries over standard ones, and often recommend mature, popular, and permissively licensed dependencies. However, we also identify gaps in usability: 4.6% of the libraries could not be resolved automatically due to structural mismatches between import names and installable packages, and only two models (out of six) provided installation guidance. While the generated code is technically valid, the lack of contextual support places the burden of manually resolving dependencies on the user . Our findings offer actionable insights for both developers and researchers, and highlight opportunities to improve the reliability and usability of LLM-generated code in the context of software dependencies. ODERN software development heavily relies on open source libraries that provide reusable functionalities through well-defined modules, significantly reducing development time and effort [1]-[3]. While libraries can help speed up development tasks, they also introduce dependencies -- interconnections between code components -- that can lead to increased complexity and dependency management challenges [4]-[6]. One critical aspect of dependency management is library selection [7]. Previous studies have explored how developers select libraries, and highlighted primarily ad-hoc processes based on past experiences, expert advice, and online resources [8], [9]. Decisions around dependency adoption are influenced by factors such as functionality, community support, and maintenance compatibility [7], [10], [11]. In parallel, the growing adoption of LLMs as programming assistants introduces new possibilities for addressing these challenges. LLMs are increasingly used to assist with code generation, and studies show their potential to enhance productivity through capabilities like code completion and search [24]-[26]. However, their impact on software dependencies (i.e., their ability to generate reliable library imports) remains unexplored.
Defending smart systems on the machine learning framework level
While smart cities and smart homes have become mainstream buzzwords, few people outside the IT and machine learning communities know about TensorFlow, PyTorch, or Theano. These are the open-source machine learning (ML) frameworks on which smart systems are built to integrate Internet of Things (IoT) devices among other things. ML algorithms and code are often found in publically available repositories, or data stores, that draw heavily on the aforementioned frameworks. In a December 2019 analysis of code hosting site GitHub, SMU Professor of Information Systems David Lo found over 46,000 repositories that were dependent on TensorFlow, and over 15,000 used PyTorch. Because of these frameworks' popularity, any vulnerability in them can be exposed to cause widespread damage.
Python Packages for Data Science - DZone Big Data
Python is one of the most widely used programming languages. Although standard Python does not offer too much, its insane number of open-source and third-party libraries holding its popularity amongst the developers. You just name the domain and Python will provide you with its best packages and libraries. Data Science and Machine Learning are two demanding technologies of this era, and Python is doing better than excellent in these two fields. Apart from Python, R is another programming language that often used in Data Science projects. R is faster and contains more computational and statistical libraries; however, in this article, we have only covered the top Python Data Science Libraries which you should know if you want to master Data Science.
Fake Third-Party Python Libraries Are Stealing Information
Python removed two fake libraries from Python Package Index (PyPI) after a German developer, Lukas Martini, reported about the packages stealing critical information. Python was released almost three decades ago, but it was only embraced in the last few years due to the increase in artificial intelligence and data science-based third-party libraries. However, these very libraries can become the prime reason for Python's downfall. This is the third time Python org witnessed infiltration and extracting information -- the other three occurred in July 2019, October 2018, and September 2017. Typosquatting – a form of cybersquatting technique that takes advantage typos made by users to hack into information – was used for deceiving and getting access to sensitive data. The idea behind such a technique is to register a look-alike name for the genuine package name, so that when a developer makes a typo he/she might import the phoney library instead of the desired one.